optimal algorithm
- Asia > Russia (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Montenegro (0.04)
- Europe > Italy (0.04)
- Government (0.45)
- Education (0.45)
- North America > Canada > British Columbia > Vancouver (0.04)
- North America > Canada > Alberta (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- North America > Canada (0.04)
- Marketing (1.00)
- Information Technology > Services (0.93)
- Asia > Russia (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
- Asia > Russia (0.14)
- North America > United States (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Russia (0.14)
- North America > United States (0.05)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (2 more...)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits
We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infimum of Kullback-Leibler divergences (KL). These algorithms are computationally expensive and require storing all past rewards, thus simpler but non-optimal algorithms are often used instead. We introduce several methods to approximate the infimum KL which reduce drastically the computational and memory costs of existing optimal algorithms, while keeping their regret guaranties. We apply our findings to design new variants of the MED and IMED algorithms, and demonstrate their interest with extensive numerical simulations.